Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test time spines for sub-daily granularity #1358

Merged
merged 6 commits into from
Jul 30, 2024

Conversation

courtneyholcomb
Copy link
Contributor

@courtneyholcomb courtneyholcomb commented Jul 27, 2024

Add test time spines for sub-daily granularity
I added as much data as seemed reasonable to include manually, and figure we can focus tests to those time ranges. And we always have the flexibility to add more data later.
I included one time spine per sub-daily grain so that we could be sure to test the syntax for each grain against each engine that supports it.
Snapshot changes are separated into isolated commits for easier review. They only include the following changes:

  • Node ID changes due to new time spine source nodes
  • Time spine table alias changes due to new time spine source nodes

@courtneyholcomb courtneyholcomb added Reload Test Data in SQL Engines Should be run when test data changes Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Jul 27, 2024
@courtneyholcomb courtneyholcomb added Reload Test Data in SQL Engines Should be run when test data changes and removed Reload Test Data in SQL Engines Should be run when test data changes labels Jul 27, 2024
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 27, 2024 17:35 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 27, 2024 17:35 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 27, 2024 17:35 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 27, 2024 17:35 — with GitHub Actions Inactive
@github-actions github-actions bot removed the Reload Test Data in SQL Engines Should be run when test data changes label Jul 27, 2024
@courtneyholcomb courtneyholcomb force-pushed the court/test-time-spines branch from 88dbe8b to fdcf983 Compare July 29, 2024 21:53
@courtneyholcomb courtneyholcomb added Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment and removed Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Jul 29, 2024
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 22:00 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 22:00 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb force-pushed the court/test-time-spines branch from eb31d9f to b12aebb Compare July 29, 2024 22:45
@courtneyholcomb courtneyholcomb added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jul 29, 2024
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 22:45 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 22:45 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 22:45 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 22:45 — with GitHub Actions Inactive
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jul 29, 2024
@courtneyholcomb courtneyholcomb force-pushed the court/test-time-spines branch from b12aebb to b1ec4d6 Compare July 29, 2024 23:13
@courtneyholcomb courtneyholcomb added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jul 29, 2024
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 23:13 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 23:13 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 23:13 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 23:13 — with GitHub Actions Inactive
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jul 29, 2024
These are breaking for Trino, so I'm removing them for now. I have a task up to put up equivalent tests in SQL rendering section instead, where I can specify to skip for Trino.
@courtneyholcomb courtneyholcomb force-pushed the court/test-time-spines branch from b1ec4d6 to 2ef8de8 Compare July 29, 2024 23:27
@courtneyholcomb courtneyholcomb added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jul 29, 2024
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 23:27 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 23:27 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 23:27 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS July 29, 2024 23:27 — with GitHub Actions Inactive
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jul 29, 2024
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question about the fallback behavior:

What happens when a user adds an HOUR grain time spine but not a DAY? What if they add a MONTH grain time spine but not a DAY?

- ["2020-01-01 00:00:00.000026"]
- ["2020-01-01 00:00:00.000027"]
- ["2020-01-01 00:00:00.000028"]
- ["2020-01-01 00:00:00.000029"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want values from multiple days, even though they won't be contiguous in the input.

I'm not sure if this really matters but I'm always wary of having test data pegged to a boundary (in this case, a year boundary).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough - I can update that tomorrow! Shouldn't impact any of the tests, just will need to repopulate the source schemas.

- name: ts
type: TIME
rows:
- ["2020-01-01 01:00:00"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. I just realized, are we going to date_trunc the time spine input to the specified grain? I'm pretty sure we don't do it today, but there's a type for it (DATE). The spec calls for the end user to configure that correctly, so I'm inclined not to date_trunc right now, but it might be something we need to do.

Presumably most people are using packages to build these things so maybe we just rely on that. If we're worried but not very worried about this we could also set up a best-effort warehouse validation, or release a validation package for time spine models that people can use if they wish.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't apply that DATE_TRUNC in JoinToTimeSpineNode or JoinOverTimeRangeNode, But we do in ReadSqlSourceNode & MetricTimeDimensionTransformNode. This feature doesn't change that behavior so far, but we could discuss if we want to change it.
I think the warehouse validations will be a good idea so we can have more efficient queries.

Comment on lines +101 to +105
time_spine_sources = {
legacy_time_spine_grain: TimeSpineSource(
schema_name=mf_test_configuration.mf_source_schema, table_name=time_spine_base_table_name
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, does this mean that we're no longer triggering the fallback behavior in the runtime, because we're overriding it with an explicit time spine input?

What happens if I define an HOUR grain time spine but leave the DAY grain one in the original configuration? Does that fail unceremoniously, do we raise an informative error, or do we just use the HOUR spine for everything?

I'm actually fine with raising an informative error or using the HOUR spine, especially for now, I just realized I'm just not clear on what happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy time spine config will only be included in your manifest if you have the metricflow_time_spine model in your project. It will be overridden if you add configs pointing to a time spine with DAY. The legacy time spine will be treated like any other time spine you have configured.
If you have a new time spine with HOUR grain in addition to the legacy DAY time spine, we'll use the DAY time spine if you query something with DAY or higher. In that case we would only use the HOUR time spine if you query with HOUR. We'll choose the time spine with the largest grain available that can resolve the requested grain.

type: simple
type_params:
measure: bookings
filter: "{{ TimeDimension('metric_time') }} < '2012-12-20'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I know why this is. Timestamp literals in Trino have to take the form TIMESTAMP <literal> so we'd need to do some substitution somewhere.

If the error is due to the filter it means we have to do custom rendering against the filter expr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I couldn't come up with an easy way to fix it here and wasn't sure it was worth doing the hard fix for an engine we barely use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think the right way to deal with this is via a more holistic approach to filter expression inputs. In the meantime, having a small gap in Trino test coverage - particularly one where the main difference essentially boils down to how we go about pasting in the user-provided expression at render time - seems fine to me.

@courtneyholcomb
Copy link
Contributor Author

What happens when a user adds an HOUR grain time spine but not a DAY? What if they add a MONTH grain time spine but not a DAY?

@tlento If they configure HOUR but not DAY (and they don't have the legacy time spine model in their project), we'll use HOUR to resolve queries of all grains larger than or equal to HOUR.
If they configure MONTH only (with no legacy time spine), I actually set up the core parser to error so they won't be able to parse successfully. We're requiring at least one time spine using DAY or higher. (Previously, we would error during parsing if they didn't have the metricflow_time_spine model in their project specifically.)

Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that all makes sense. We should follow up on:

  1. granularity validations/date_trunc behaviors for time spine inputs (I favor validation, I think, as littering the codebase with date_trunc calls is kind of crappy)
  2. longer term approach to default fallback interactions. For now this feels like a documentation problem but we may need to refine some of our internals a bit.

I think both of these will come up with custom calendar so we'll have an opportunity to re-visit these in the coming months.

@courtneyholcomb
Copy link
Contributor Author

Ok, that all makes sense. We should follow up on:

  1. granularity validations/date_trunc behaviors for time spine inputs (I favor validation, I think, as littering the codebase with date_trunc calls is kind of crappy)
  2. longer term approach to default fallback interactions. For now this feels like a documentation problem but we may need to refine some of our internals a bit.

I think both of these will come up with custom calendar so we'll have an opportunity to re-visit these in the coming months.

@tlento Sounds good! I'll put up tasks for those in the Wrap Up milestone for this project, and we can move them to the custom calendar project if we see fit later.
If you don't mind I'm going to change the date ranges of the time spine data in a follow up PR to avoid interfering with reviewability of the other PRs since this is at the bottom of the stack.

Makes some updates to enable no-metric queries with sub-daily time
dimensions and adds integration tests for those queries.
@courtneyholcomb courtneyholcomb enabled auto-merge (squash) July 30, 2024 21:02
@courtneyholcomb courtneyholcomb merged commit b72ac67 into main Jul 30, 2024
15 checks passed
@courtneyholcomb courtneyholcomb deleted the court/test-time-spines branch July 30, 2024 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants